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SUMMARY  PAGE 


THE  PROBLEM 

To  evaluate  the  Initial  Strength  Test,  the  Physical  Readiness 
Test,  and  selected  Fleishman  tests  as  meaningful  measures  of 
the  type  of  fitness  required  by  Marine  Corps  combat  troops. 


FINDINGS 

_ (1)  None  of  these  tests  appear  entirely  satisfactory  for  this  pur- 

pose.  (2)  Types  and  levels  of  fitness  required  by  combat  troops 
have  never  been  defined.  (3)  Satisfactory  tests  cannot  be  devel¬ 
oped  until  performance  criteria  have  been  established. 


CONCLUSIONS 

Further  work  must  be  undertaken  before  valid  measures  of  com¬ 
bat  fitness  can  be  developed. 
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ABSTRACT 


The  purpose  of  this  investigation  was  to  evaluate  the  Initial 
Strength  Test,  the  Physical  Reaiiness  Test,  and  selected  Fleishman 
tests  as  measures  of  the  type  of  physical  fitness  required  by  Marine 
Corps  combat  troops.  It  was  concluded  that  none  of  them  were  satis¬ 
factory  for  this  purpose.  There  are  at  present  no  criteria  establishing 
the  type  and  level  of  fitness  required  by  such  troops.  Satisfactory  tests 
cannot  be  developed  until  such  criteria  have  been  established. 


A  Critical  Analysis  of  Three  Physical  Fitness  Tests 


INTRODUCTION 

The  investigation  of  physical  fitness  in  the  armed  services 
involves  three  parameters:  (1)  the  determination  of  the  kind  of  fitness 
required  of  combat  troops,  (2)  the  development  of  satisfactory  tests  for 
evaluating  this  type  of  fitness  and  changes  therein,  and  (3)  the  devising 
of  the  most  effective  methods  of  improving  military  physical  fitness. 
At  the  present  time,  at  least  three  separate  tests  for  the  determination 
n{  it*  n«*  in  tha  Merino  Corps.  THu  physical  condition  of 

male  Marines  under  40  years  of  age  is  judged  by  means  of  the  Physical 
Readiness  Test;  the  Recruit  Training  Regiments  employ  the  Initial  and 
Final  Strength  Tests;  the  Naval  Medical  Field  Research  Laboratory  is 
using  a  battery  based  on  the  factor  analysis  studies  made  by  Edwin  A. 
Fleishman  under  a  grant  from  the  Office  of  Naval  Research.  In  the 
case  of  the  first  two  norms,  standard  deviations,  percentiles,  reliability 
coefficients,  and  similar  data  are  not  available.  Under  such  conditions, 
the  level  of  fitness  credited  to  a  given  individual  or  the  results  attrib¬ 
uted  to  a  given  training  program  may  actually  be  a  reflection  of  the  test 
used  as  a  criterion.  Conceivably  a  different  opinion  might  have  been 
rendered  if  the  investigators  had  chosen  to  use  one  of  the  other  tests. 
An  essential  preliminary  to  sound  work  in  the  field  of  military  physical 
fitness  is  a  careful  examination  of  the  tools  by  which  it  is  measured 
and  the  relationships  existing  between  them.  It  was  the  purpose  of  this 
study  to  make  a  critical  examination  of  the  three  tests  mentioned  above. 

I.  Phyfc**  Readiness  Test 

This  is  by  definition  a  test  to  determine  whether  an  individual 
meets  certain  minimum  acceptable  standards.  It  is  performed  in 
utilities  with  boots  and  helmet,  light  marching  pack,  and  organic  weapon 
and  belt.1  Since  the  men  with  superior  fitness  have  no  incentive  to  per¬ 
form  the  test  other  than  in  the  sasisst  manner  possible,  it  does  not 
generate  data  by  which  the  individual  can  be  compared  with  himself  or 
by  which  one  group  can  be  compared  with  another. 

Event  #1  in  the  test,  climbing  uphill,  consists  of  stepping  on 
and  off  a  platform  18  inches  high  for  60  up  and  down  steps  in  3  minutes. 
It  is  stated  that  "this  ovent  simulates  marching  uphill  at  a  rapid  and 
steady  rate.  "  Whether  this  is  actually  the  case  seems  open  to  some 
question. 
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The  event  is  clearly  derived  from  the  Harvard  Step  Test,  which 
purports  to  measure  an  individual's  "general  fitness  for  hard  work. 

It  has  been  widely  use  c!,  and  equally  widely  praised  and  criticised. 
U.  S.  Army  investigators  reported  that  the  test  "is  a  useful  one  and 
serves  to  give  an  approximate  overall  evaluation  of  the  fitr.es s  of  a 
group  of  men.  "3  u.  S.  Navy  evaluators  stated  the  test  "affords  a  con¬ 
venient  and  reliable  method  for  estimating  the  progress  of  physical 
conditioning  and  of  the  degree  of  improvement  in  such  a  program. 

Or.  the  other  hand.  Cureton  and  his  co-workers  have  complained 
that  the  criterion  used  for  its  validation  "is  expressed  in  an  illegitimate 
-statistical  form,"  includes  dependent  veriehlee,  ia  probably  invxHH-S 
Montoye  found  there  is  some  slight  relationship  between  Step  Test 
scores  and  work  capacity  but  it  is  of  little  practical  importance.^  Henry 
and  Berg  concluded  that  "physical  fitness  of  the  type  produced  by  a 
typical  athletic  training  regimen  can  be  measured  . . .  only  to  a  limited 
extent  by  performances  such  as  . . .  stool  stepping  to  exhaustion. 

A  test  of  physical  condition  is  of  value  only  if  it  has  been 
demonstrated  that  it  correlates  highly  with  physiologic  performance  in 
the  event  which  it  is  desired  to  test.  Attempts  to  correlate  the  Harvard 
Step  Test  with  measures  of  performance  or  indices  of  physique  have 
in  general  given  figures  too  low  to  be  of  predictive  value.  Pome  typical 
examples  drawn  from  the  literature  are  given  in  Table  1.  All  of  this 
raises  considerable  question  as  to  the  value  of  this  test  in  predicting 
military  fitness. 


Table  1 

Correlation  of  Harvard  Step  Teat  Scores  with 
Various  Other  Tests 


Croat 

Correlation  (r) 
with  HST 

Source 
of  Data 

Army  Air  forces'  Test 

0.24 

3 

Army  Ground  Forces'  Test 

0.26 

3 

Mile  Bun 

0.310 

8 

Cross-country  Run  (1-3/4  mi.) 

0.38 

6 

Three-mile  Bun  with  Marching  Pack 

-0.21 

9 

Huffier  Index 

-0.39 

10 

Pignet  Index 

0.14 

10 

Reciprocal  Buffon  Index 

0.13 

10 

Bruce  Physical  Fitness  Index 

0.236 

11 
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While  many  additional  correlations  of  this  test  with  various 
other  criteria  have  been  reported  in  the  literature,  the  writers  have 
been  unable  to  find  any  occasion  on  which  it  has  been  used  to  evaluate 
ability  to  move  uphill.  However,  the  fact  that  it  has  a  low  correlation 
with  cross-country  running,  which  ordinarily  includes  a  good  deal  of 
uphill  running,  may  be  significant.  Roy  ell  i  and  Aghano^  specifically 
deny  that  it  can  be  used  as  a  test  of  ability  to  develop  maximal  energy 
expenditure,  as  in  running  uphill. 

The  Harvard  Step  Test  has  no  significant  correlation  with 
height  or  combined  height-weight  factors,  although  e  tremely  heavy  men 
may  make  relatively  low  scores.  *3  However,  Re-.bournU  has  com¬ 
mented  that  the  fine  scores  made  by  Gurkha  troops  i  i  the  Harvard  Pack 
Test,  which  is  also  one  of  rapid  step  climbing,  appeared  to- be- related 
to  their  exceptional  calf  development.  It  remains  to  be  determined 
whether  the  step  test  as  used  in  the  Physical  Readiness  Test  is  also 
related  to  calf  development. 

The  muscle  action  involved  in  this  test  suggests  that  it  might 
prove  a  valid  measure  of  the  ability  to  move  through  deep  mud,  such  as 
is  said  to  characterise  the  rice  paddies  of  Viet  Nam. 

Event  #2  in  this  test  is  a  20-foot  rope  climb.  This  item  cer¬ 
tainly  has  "face  validity,"  but  there  seems  to  be  little  information  on 
precisely  what  it  measures.  Fleishman15  submitted  a  somewhat  similar 
test  to  factor  analysis  and  reported  a  loading  of  0.(7  with  dynamic 
strength  and  0. 41  with  explosive  strength.  In  all  probability,  a  quite 
similar  loading  would  be  found  for  the  Marine  Corpe  version  of  the  rope 
climb. 


Event  #3.  evacuation,  requires  a  man  to  run  50  yards  in  a 
sig-sag  fashion,  lift  a  "casualty,"  and  carry  him  back  to  the  starting 

introduces  an  uncontrolled  variable.  The  fact  that  the  weight  of  the 
"casualty"  may  vary  on  each  test  is  a  second  uncontrolled  variable. 


Event  #4,  advance  by  fire  and  maneuver,  requires  that  the 
Marine  "creeps  or  crawls"  for  25  yards,  thus  immediately  introducing 
an  uncontrolled  variable.  He  is  then  to  run  in  a  sig-sag  fashion,  thereby 
introducing  a  second  uncontrolled  variable. 


Since  both  Events  #3  and  #4  include  two  uncontrolled  variables, 
it  is  evident  that  there  may  be  considerable  difference  in  the  way  in 
which  they  are  run  by  different  individuals  or  even  by  the  same  individual 
on  different  occasions.  It  would  be  expected  that  test-retest  reliability 
would  be  undesirably  low. 
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Event  #5,  forced  march,  require*  a  3 -mile  run-walk  carrying 
a  light  marching  pack.  Like  Event  #2,  this  item  has  face  validity. 

It  will  be  noted  that  Events  #3,  #4,  and  #5  all  require  the  sub¬ 
ject  to  run  with  a  load  of  some  kind.  This  suggests  the  possibility  that 
if  each  of  these  events  were  standardised  and  run  for  time,  a  high 
inter  cor  relation  would  be  found  between  them.  In  that  case  the  inclusion 
of  these  three  events  instead  of  only  the  best  of  them  adds  little  or 
nothing  to  the  test  battery. 

Another  difficulty  is  that  events  of  this  type  may  be  quite 
eerlnneiy  affected  by  the  Scflltl  on  a  dry,  level,  firm  surface 

should  be  much  lower  than  those  in  muddy,  sandy,  or  hilly  country. 
From  the  standpoint  of  practical  administration,  further  difficulties  ire 
often  encountered.  The  3-mile  forced  march  is  sometimes  made  in 
platoon  formation,  rather  than  as  an  individual  effort.  The  individual 
events  are  often  given  in  whatever  order  is  most  convenient,  with  the 
exception  that  the  3-mile  forced  march  is  normally  the  final  event  in 
the  series.  The  difficulty  is  that  each  event  affects  the  scores  in  thoee 
which  follow  it.  Unless  the  order  of  administration  is  the  same  each 
time,  the  scores  cannot  be  compared. 

The  minimum  requirements  for  any  satisfactory  test  are  that 
it  be  valid,  reliable,  standardised,  and  normed  for  the  population 
being  tested.  *  Since  these  requirements  are  not  met  by  the  Physical 
Readiness  Test,  it  is  impossible  to  use  scores  based  upon  it  for  any 
sort  of  statistieel  analysis. 


B.  Initial  and  Final  Strength  Testa 


Training  Regi¬ 
ment,  Parris  Island,  i«  evaluated  at  the  start  of  their  training  by  means 


Reliability  refers  to  tbs  ability  of  a  set  of  measurements  to  give 
consistent  results.  The  reliability  of  a  certain  instrument  applies  to  s 
certain  population  under  certain  conditions.  It  is  usually  reported  in 
terms  of  a  reliability  coefficient  which  expresses  the  relationship 
between  two  measurements.  A  test  is  valid  if  it  measures  what  il 
purports  to  measure.  This  is  usually  stated  as  a  validity  coefficient, 
which  expresses  the  relationship  between  the  predictor  and  the  criterion. 
For  technical  reasons,  reliability  and  validity  require  opposite 
approaches  in  test  construction.  The  result  is  that  the  two  can  nevei 
both  be  maximal  in  a  single  test.  In  actual  practice,  a  tester  may  seek 
to  combine  several  reliable  tests  into  a  valid  battery. 
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of  the  Initial  Strength  Test  and  at  the  end  by  means  of  the  Final  Strength 
Test.16  In  both  tests  they  wear  gym  suits  and  sneakers.  The  items  in 
this  battery  are  varied  according  to  the  weather.  If  the  weather  is  suit¬ 
able,  the  recruits  are  required  to  perform  the  following: 

1.  Pull-ups,  with  palms  out 

2.  Push-ups 

3.  '  Sit-ups 

4.  Bend  and  thrust 

5.  300-yard  shuttle  run  (60  yards  x  5) 

- In  event  rjf  inclement  weatheT, — the-  300-yard  shuttle  run  is 

replaced  by  side  straddle  hops,  so  that  actually  two  forms  of  this  test 
exist.  Apparently  this  battery  dates  back  to  World  War  II  and  is  identi¬ 
cal  with  that  used  by  the  Army.*?  No  intercorrelations  of  the  test 
items,  means,  standard  deviations  or  norms  for  the  scores  made  by 
contemporary  Parris  Island  Marine  Corps  recruits  have  been  found  in 
the  available  literature.  Therefore  it  was  first  necessary  to  deter¬ 
mine  these  statistics.  Data  collected  on  a  representative  number  of 
recruits  were  analysed41  and  the  findings  are  shown  in  Table  2.  The 
combined  group  consists  oi  248  men  who  routinely  performed  the 
300-yard  shuttle  zun  as  part  of  their  test  battery  and  256  men  (Group B) 
who  were  tested  during  fool  weather  ind  would  normally  have  performed 
only  the  side-straddle  hopi.  Since  it  was  impossible  to  determine  the 
correlation  between  these  tvo  items  from  what  are  normally  dichotomous 
group r,  Group  B  was  required  to  perform  both  of  these  tests  within  a 
day  or  two  of  each  other.  For  comparative  purposes,  similar  c  ta 
recorded  by  Bates at  the  San  Diego  Marine  Corps  Recruit  Depot  in 
1959  are  also  displayed.  No  figures  are  shown  under  bend  and  thrust 
and  side  straddle  hops,  since  the  8a h  Diego  testers  used  the  squat-jump 
in  place  of  these.  Simple  inspection  indicates  that  there  are  no  great 
differences  between  the  two  groups  in  the  scores  for  the  other  events. 

Reference  to  the  scoring  table  for  the  Initial  Strength  Tests, 
however,  raises  certair  questions.  Presumably  the  mean  figures 
represent  a  point  of  equal  t  Jficulty  in  each  case,  and  it  would  be  antici¬ 
pated  that  the  same  numb  r  of  points  would  be  awarded  in  each  case 


41  The  authors  are  indebted  to  Lt.  Col.  C.  R.  Liyingston,  USMC,  Data 
Processing  Officer,  Data  Processing  Installation  No.  2,  Marine  Corps 
Base,  Camp  Lejeune,  and  to  Capt.  E.  J.  Doran,  USMC,  Assistant  Data 
Processing  Officer,  for  designing  programs  and  processing  most  of  the 
statistical  work  in  this  report. 
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for  attaining  it.  As  is  shown  in  Table  3,  this  is  not  the  case.  Actually 
it  appears  that  the  bend  and  thrust  is  under -evaluated  and  the  side 
straddle  hops  over -evaluated. 

From  the  standpoint  of  test  administration,  scoring  tables  of 
this  type  are  undesirable,  since  they  yield  numerical  values  which  have 
no  particular  significance  to  either  the  testee  or  the  tester.  Guilford*^ 
recommended  centile  rank  positions*  as  the  most  meaningful  to  the 
non- statistician,  and  there  would  seem  to  be  little  reason  to  disagree 
with  his  suggestion.  Adoption  of  such  a  method  of  scoring  would  have 
the  distinct  advantage  that  it  would  be  relatively  simple  to  place  a 
profile  chart  on  the  back  of  the  recruit's  score  card.  This  would  be  of 
great  assistance  in  evaluating  both  the  recruit's  improvement  and  the 
effectiveness  of  the  training  pro^.»m  itself. 

The  scores  for  each  test  were  intercorrelated  by  means  of  the 
Pearson  t.  The  results  are  shown  in  Table  4.  Since  four  of  the  tests 
were  common  to  both  groups,  the  scores  were  combined  and  the  over¬ 
all  r  computed.  Examination  of  these  figures  reveals  that  they  are 


Table  3 

Means  and  Equivalent  Points  for  Initial  Strength  Test  Items 


Test 

Mean 

Points 

Pull-ups 

5 

45 

Sit-ups - 

46 

49 

Bend  and  Thrust 

24 

32 

Push-ups 

£T> 

50 

300-yd  Shuttle  Run 

5? 

53 

Side  Straddle  '{ops 

68 

A  centile  is  a  point  on  a  scoring  table  below  which  is  any  given 
proportion  of  scores.  That  is,  79%  of  the  population  will  attain  a  score 
less  than  that  represented  by  the  80  centile. 
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Strength  Test  Items 


8 


-0.21 


satisfactorily  orthogonal*  with  the  possible  exception  of  pull-ups  versus 
push-ups,  where  a  combined  correlation  of  r  =  0.52  was  obtained. 
Working  with  Great  Lakes  naval  recruits,  Fleishman  obtained  an  almost 
identical  correlation  (r  =  0.  58),  and  commented  that  "push-ups  added 
to  pull-ups  contributes  little  new  information  regarding  a  subject's 
dynamic  strength.  The  same  reasoning  seems  applicable  here. 

A  point  of  special  interest  is  found  in  the  fact  that  the  correla¬ 
tion  between  the  300-yard  shuttle  run  and  the  side  straddle  hops  is  so 
low.  Whatever  the  latter  measures,  it  is  not  the  same  factor  measured 
by  the  former.  So  far  as  this  item  is  concerned,  the  recruits  received 
-during  foul  weather  art  given  a  different  test  than  are  those  received 
during  good  weather,  with  the  further  advantage  that  points  are  easier 
to  earn  in  the  side  straddle  hop  than  in  the  300-yard  shuttle  run. 


Ill.  Fleishman  Tests 


In  1958  the  Office  of  Naval  Research  initiated  a  project  entitled 
"The  Development  of  Criteria  of  Physical  Proficiency."  This  was 
assigned  to  Yale  University  and  was  directed  by  Edwin  A.  Fleishman.*® 
A  factor  analysis  of  the  findings  of  previous  research  in  the  field 
identified  14  factors  of  physical  proficiency.  The  final  outcome  was  a 
proposed  Fitness  Test  battery  designed  to  measure  11  factors:  explosive 
strength,  static  strength,  dynamic  strength,  trank  strength,  extent 
flexibility,  dynamic  flexibility,  gross  body  equilibrium,  balance,  speed 
of  limb  movement,  gross  body  coordination,  and  stamina.  The  present 
investigators  considered  that  five  of  those  items  were  of  special  intere  st 
for  the  testing  of  combat  troops:  sxplosive  strength  (ability  to  exert 
maximum  energy  in  one  explosive-  act),  static  strength  (exertion  of  a 
maximum  force  for  a  brief  period  of  time),  dynamic  strength  (strength 


a  period  of  time),  trunk  strength,  and  stamina  (cardiovascular  endur¬ 
ance  during  prolonged  exertion  of  the  body).  Fleishman  recommends 
the  following  as  the  respective  tests  of  choice  for  each  of  these  factors: 
shuttle  run  (5  :<  20  yards),  hand  grip  with  dynamometer,  pull-ups,  leg 
lifts  (maximal  number  in  30  seconds),  and  800-yard  run-walk.  For  the 
purpose  of  this  paper  these  will  be  referred  to  as  the  Fleishman  Tests. 
Norms  and  centiles  are  available  for  each  test.*® 


The  different  items  measure  different  qualities.  High  correlations 
between  separate  events  indicate  that  the  tests  simply  measure  the 
same  thing  in  different  ways. 
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Experience  with  this  battery  revealed  certain  problems  in  its 
administration.  On  the  basis  of  its  use  at  both  Parris  Island  and  Camp 
Lejeune,  the  shuttle  run  does  not  appear  satisfactory.  The  ground  at 
these  bases  is  soft,  sandy,  and  grassy.  A  number  of  subjects  slide  or 
slip  and  fall  when  reversing  the  direction  of  the  run.  The  first  few 
testees  pound  a  hole  into  the  ground  at  the  point  where  the  change  of 
direction  takes  place.  The  following  runners  simply  plant  their  foot  in 
this  hole  and  pivot,  thereby  gaining  a  mechanical  advantage  which 
obscures  differences  in  actual  speed  and  agility. 

- - cm  tn  FlaiahmAt^a  hook  shows  that  his  second  choice 

for  the  measurement  of  explosive  strength  is  the  50-yard  dash.  This 
correlates  _r  =  0. 80  with  performance  in  the  shuttle  run  and  has  a  load¬ 
ing  of  =  0.  75  with  the  factor  identified  as  explosive  strength.  This 
does  not  appear  to  be  essentially  different  from  the  loading  of  r  =  0. 77 
found  for  the  shuttle  run.  The  reliability  of  the  two  tests  is  practically 
identical:  _r «  0. 86  for  the  50-yard  dash  and  0. 85  for  the  shuttle  run. 
In  view  of  the  close  relationship  between  the  two  events  and  the  demon¬ 
strated  unsatisfactory  nature  of  the  shuttle  run  under  our  conditions, 
experiments  will  be  made  in  substituting  the  50-yard  dash  for  the  shuttle 
run  in  future  studies.  2* 

Fleishman  concluded  that  the  pull-up  was  the  best  measure  of 
dynamic  strength  and  that  use  of  the  "under-hand  grip"  (palms  facing 
the  subject)  was  preferable  to  the  "over-hand  grip"  (palms  facing  away 
from  the  subject),  since  more  pull-ups  can  be  done  this  way  and  a 
better  distribution  of  scores  is  obtained.  He  gives  no  comparative 
figures  on  the  two  methods,  but  his  opinion  agrees  with  an  earlier  Army 
Atrr  Force  statement  to  the  effect  that  chinning  performances  are 
superior  whoa  the  under-hand  grip  is  used,22  and  has  received  support 
from  an  electromyographic  study  of  t ho  muscles  involved.23  This 
raises  a  problem  In  that  the  Initial  Strength  Test  specifies  that  pull-ups 
are  to  be  done  with  the  palms  out. 

From  the  information  available  in  the  literature,  there  appears 
to  be  relatively  littlo  difference  in  the  scores  for  the  two  styles. 
DeWitt,2*  using  college  men  as  subjects,  obtained  a  mean  of  9.  71  for 
the  under -hand  style  and  one  of  7.63  for  the  over-hand  grip,  as  a  dif¬ 
ference  of  2. 08  in  favor  of  the  former.  Experienced  testers  agree  that 
unless  otherwise  directed  most  of  the  subjects  will  choose  the  former. 
It  has  been  suggested  that  this  is  because  the  over-hand  grip  seems 
more  fatiguing,  although  it  is  not  actually  more  costly  in  energy.^3 
However,  when  the  test  is  administered  to  Marine  Corps  personnel  in 
this  fashion,  the  officers  and  non-commissioned  officers  in  charge  object 
to  the  use  of  the  under-hand  grip.  It  is  their  contention  that  in  scaling 
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a  wall,  swinging  into  a  window,  or  ascending  a  roof,  the  trooper  must 
use  the  over-hand  grip,  and  that  both  as  a  matter  of  training  and  in 
fairness  to  the  individual  he  should  be  tested  as  he  will  perform. 

It  is,  of  course,  not  necessary  that  the  test  duplicate  the 
criterion;  only  that  a  satisfactorily  high  validity  correlation  exists 
between  the  two.  The  correlation  between  these  two  forms  of  doing 
pull-ups  and  between  test-retest  scores  when  Army  Air  Force  cadets 
are  used  as  subjects^  has  been  shown  to  be  on  the  order  of  r  =  0.735 
to  0.795.  McGraw,^?  also  studying  college  men,  found  means  of  9.94 
and  8.02  respectively  on  one  occasion  and  of  10.84  and  9.82  respectively 

ona  second  occasion,  from  which  he  concluded- that  while  the-under - 

hand  grip  gave  ~e  higher  scores,  the  day-to-day  variations  (d  =  0.90 
and  1.80  respectively)  were  apt  to  be  as  large  as  the  difference  between 
grips  (d  =  1.92  and  1.02  respectively).  McGraw  reports  a  test-retest 
coefficient  of  r  =  0.  73  for  the  under-hand  grip  and  0.  88  for  the  over¬ 
hand  grip,  remarking  that  the  former  figure  "is  well  below  the  value 
usually  accepted  for  retest  reliability."  The  coefficient  of  r_  =  0.73  for 
the  under-hand  grip  is  somewhat  surprising,  as  Fleishman  found  that 
pull-ups  had  test- retest  reliability  of  r  *  0.93  when  used  with  recruits 
at  Great  Lakes  Naval  Training  Center. 

It  is  not  clear  whether  the  Army  Air  Force  cadets,  college 
men  and  Navy  recruits  are  from  the  same  population  as  are  Marine 
Corps  combat  troops.  To  clarify  this  point,  it  was  necessary  to  deter¬ 
mine  the  correlation  of  the  two  styles  and  the  test-retest  reliability  of 
this  item  when  used  with  typical  Marias  Corps  infantrymen. 

Forty-eight  men  from  "O*  Company,  2d  Battalion,  2d  Marine 
Division,  stationed  at  Camp  Lejsuaa,  tarred  as  volunteer  subjects.* 
However,  only  >1  completed  all  tests  sad  are  reported  on  here.  Pull- 
ups  were  included  in  their  routine  deliy  physical  training  program,  so 
that  the  problem  of  mascle  soreness  or  of  fatigue  from  unaccustomed 
exercise  was  not  a  factor  la  the  findings . 

On  26  January  1965  half  of  the  subjects  performed  chins  using 
the  under-hand  grip.  The  other  half  used  the  over-hand  grip.  The 
following  day  the  tests  were  repeated,  with  the  men  reversing  their 
grip.  On  the  third  and  fourth  days  rsspectively  the  testing  program  of 
the  first  and  second  days  was  replicated.  A  half  point  was  counted  if  a 
man  could  get  his  upper  arms  parallel  to  the  ground  although  he  could 
not  get  his  chin  over  the  bar. 


*  The  authors  are  indebted  to  1st  Lt.  F.  Leroy  Scovill,  III,  USMC,  of 
"G"  Company,  for  his  cooperation  in  this  phase  of  the  study. 
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A  number  of  drill  instructor*  had  commented  that  pull-up 
scores  appeared  to  be  inversely  related  to  the  man's  body  weight.  It  is 
almost  self-evident  that  this  will  be  true  when  the  body  weight  includes 
a  high  percentage  of  fat.  The  relationship  between  these  two  variables 
when  all  subjects  are  in  a  state  of  vigorous  physical  training  and  their 
weight  is  presumably  predominantly  lean  body  mass  is  less  apparent. 
To  answer  this  question,  the  body  weight  of  the  subjects  was  also  re¬ 
corded  on  the  occasion  of  the  first  test. 


During  the  teste  it  was  determined  that  the  men  included  rope 
climbing  in  their  physical  fitness  training  program.  Advantage  was 
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body  weight  and  rope  climbing,  with  and  without  packs,  since  there  is 
a  "face  validity"  between  pull-up*  and  rope  climbing.  On  4  February 
19b5  the  subjects  performed  the  rope  climb  in  accordance  with  tin 
instructions  laid  down  in  the  Marine  Corps  Physical  Fitness  Test. 
The  rope  was  20  feet  long,  1-1/2  inches  in  diameter  and  knotted  approxi¬ 
mately  every  2  feet.  The  man  started  in  the  standing  position  anc 
grasped  the  rope  as  high  as  he  could  reach.  The  stop  watch  was  startee 
when  the  command  "Go"  was  given  and  stopped  when  the  man  touched 
the  beam  from  which  the  rope  was  suspended.  Half  of  the  men  climbed 
with  utilities,  boots,  helmet,  light  marching  pack,  and  organic  weapor 
and  belt;  the  other  half  wore  utilities  and  belts  only.  The  following  d*y 
this  was  reversed*  se  that  these  who  had  climbed  with  helmets,  pack 
and  weapon  climbed  with  utilities  sad  belts  only,  and  the  others  c limbec 
with  the  required  gear.  The  position  of  the  first  two  or  three  knots  in 
relation  te  (Mw  height  Of  the  wen  makes  isr  differences  in  starting  styles. 
A  tall  man  may  be  able  te  mash  above  a  knot  and  secure  a  comfortable 


group  prior  la 


«t  mao  may  have  to  await  the  starting 
>— p  to  securs  his  handhold.  For  this 

It  tbs  nearest  0. 9  second. 


During  this  whale  series  ef  tests  the  weather  was  in  ths  loo 
thirties.  The  men's  bonds  wort  shttlsd*  Dm  pull-up  bar  was  cold,  and 
ths  ropes  were  net  only  sold  hot  stiff.  As  s  result,  scorss  ars  probably 
lower  sad  times  higher  than  would  have  been  ths  csss  in  more  moderate 
weather. 


Ths  mean  data  for  the  two  methods  of  performing  ths  pull-upa 
ars  shown  in  Table  9.  For  comparative  purposes  ths  mean  scorss  o? 
ths  first  pull-up  attempts  of  all  groups  reported  by  ths  investigator! 
cited  above  ere  shown  in  Table  6. 


Inspection  of  these  date  would  suggest  that  there  is  compare- 
tivsly  little  difference  between  aviation  cadets,  college  men,  end  Marini 
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Table  5 


Intercorrelations  of  Pull-ups 


Trial 

n 

Over-hand  Grip 

Under-hand  Grip 

Correlation 

Mean  SD 

EE3K9 

First 

49 

■OBI 

9.9  3.0 

mssm 

Second 

49 

10.2  3.0 

Correlation 

HHfl 

0.93 

■1 

Table  6 


Comparison  of  Pull-up  Scores  of  Various  Groups 


Subject* 

N 

Over-hand  Grip 

Under -hand  Grip 

Mean  SO 

Mean  SD 

Aviation  Cadets** 
Aviation  Cadets** 
College  Mon24 
College  Men*7 

Navy  kecruits*? 
Marin*  Troops 

4057 

3445 

144 

51 

201 

49 

0.52  3.09 

0.20  2.94 

7.43 

0.02 

7.00  3.00 

9.45  3.29 

9.17  3.17 

9.71 

9.94 

5.94  3.41 

9.90  3.00 

Corps  troopers,  but  the  stated  figures  very  likely  underestimate  the 
comparative  abilities  of  the  Marines.  These  men  were  tested  while 
wearing  fatigue  clothes  and  field  boots.  It  is  assumed  that  all  other 
groups  were  tested  in  gym  costume  and  tennis  shoes.  Quite  likely  they 
also  had  the  additional  advantage  of  more  favorable  environmental 
conditions.  Acceptance  of  the  mean  figure  of  9.9  would  place  the 
Marines  at  the  70th  percentile  on  Fleishman's  national  norms,  but  under 
similar  test  conditions  that  would  probably  rate  five  or  ten  points  higher 
than  this.  In  any  eves':,  they  seem  distinctly  superior  to  the  Navy 
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recruits,  indicating  that  norms  based  on  naval  personnel  are  not 
necessarily  valid  for  Marine  Corps  troops. 

As  has  been  true  in  all  previous  studies,  our  subjects  made 
higher  scores  with  the  under -hand  grip  than  with  the  over-hand  grip. 
What  is  of  primary  interest  is  that  the  correlations  of  first  under-hand 
versus  first  over-hand,  first  under-hand  versus  second  over-hand, 
second  under-hand  versus  first  over-hand,  and  second  under-hanc 
versus  second  over-hand  are  all  £  =  0.89  or  better  (Table  5).  Thic 
indicates  that  men  who  do  well  on  one  style  will  also  do  well  on  the 
other.  Since  the  type  of  grip  will  have  relatively  little  effect  on  the 
relative  placement  of  the  individual  being  tested,  it  is  desirable  to  ust 


tribution  of  scores.  As  a  minor  benefit,  this  would  make  it  possible  tc 
compare  the  scores  of  Marines  with  pre -determined  national  norms. 15 


The  correlation  for  first  under-hand  versus  second  under- ham 
is  identical  with  that  reported  by  Fleishman,  r  =  0. 93,  and  prac  tic  alii 
identical  with  that  for  first  over-hand  versus  second  over-hand,  r_  =  0.95 
This  indicates  that  for  subjects  accustomed  to  practicing  pull-ups  i: 
their  regular  physical  training  this  test  has  a  high  reliability. 

The  mean  body  weight  of  our  subjects  was  165.6  pound 
(S.  O.  *  18.  2  pounds).  When  the  under-hand  grip  is  used,  the  correla 
tion  for  the  first  pull-up  scores  versus  body  weight  is  _r  =  -0.  09;  wit 
the  over-hand  grip,  it  is  r  *  -0.21  (Table  7).  Thus  when  dealing  wit 
well-conditioned  troops,  So  influence  of  body  weight  on  the  scores  i 
negative,  but  to  such  a  limited  degree  as  to  be  of  little  consequence 
There  is,  then,  no  need  to  take  body  weight  into  consideration  whe; 
giving  pull-up  tests  to  trained  Marine  Corps  troops.  The  situation  wit 
lmtvaiitad  MCtniil  may  be  and  requires  further  study 

The  picture  is  almost  identical  insofar  as  the  effect  of  body  weight  o 


The  first  ovsr-hand  grip  scorss  and  rope  climb  with  pac 
correlate  £«  -0.49.  With  the  under-hand  grip  the  correlation  i 
r  «  -0. SI,  which  is  again  sssentlally  identical  (Table  8).  (The  negativ 


Table  7 


Intercorrelation  of  Body  Woight,  Pull-ups,  and  Rope  Climb  (N  =  32) 
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Table  8 


Intercor relatione  of  First  Pull-ups  and  Rope  Climb 


Style  of  Pull-up 

N 

Rope  Climb 
without  Pack 
(sec) 

Correla¬ 

tion 

Rope  Climb 
with  Pack 
(sec) 

Hi 

iT'mu 

Mean  SD 

■■ 

Over -hand  Grip 

31 

BOD 

mm 

BB 1 

Under-band  Grip 

31 

HI 

B3 

Hi 

H 

correlation  here  may  be  confusing  at  first  sight,  but  reflects  the  fact 
that  in  rope  climbing  a  decreased  time  constitutes  a  superii  -  perform¬ 
ance. )  This  shows  a  moderate  but  substantial  relationship  between 
performar.ee  in  the  two  events.  The  coefficient  of  determination*  must 
then  be  on  the  order  of  25%  which  indicates  that  the  two  events  are  net 
entirely  orthogonal.  With  a  correlation  as  great  as  ir  *  0.50,  little 
additional  information  would  he  gained  by  including  both  items  in  a 
single  test  battery.  However,  the  proportion  that  is  independent  is  so 
large  that  performance  in  one  will  not  serve  to  predict  satisfactorily 
performance  in  the  other.  This  confirms  Fleishman's  statement  that 
the  rope  climb  has  a  large  factor  loading  with  dynamic  strength. 

While  Fleishman  recommends  use  of  the  600-yard  run-walk 
as  a  measure  of  stamina,  tide  seems  to  have  been  an  afterthought.  This 
svent  is  not  included  In  his  tables  of  iaterc  or  relations  between  tests. 
Hence  the  first  step  was  to  determine  whether  it  was  in  Set  orthogonal 
to  the  other  events  in  the  glotshmin  battery.  Data  were  therefore  col¬ 
lected  on  the  test  scores  of  ITS  recruits  st  ParrisTsland  early  in  1965. 
The  intercorrelatleas  are  displayed  fa  Table  9,  from  which  it  is  dear 
that  this  test  is  ortbegensl  to  tbs  otter  items. 

Sven  with  this  established,  another  problem  was  evident.  The 
Initial  Strength  Test  employs  the  300-yard  shuttle  run  as  a  measure  of 
cardiorespiratory  endurance,  the  Fleishman  tests  utilise  the  600-yard 
run-walk,  and  the  Physical  Readiness  Test  incorporates  the  3-mile 


S  2 

The  coefficient  of  determination  is  defined  as  jr  .  It  represents  the 
percentage  of  individual  differences  in  ons  variable  which  is  associated 
with  or  determined  by  the  individual  differences  in  anothsr  variable. 
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Table  9 


Intercor relation ■  Between  Selected  Teat  Item* 
in  Fleishman  Te-'ta  (N  =  275)*® 


■ 

Test  Item 

Hand  Grip 

Shuttle  Run 

Leg  Lift 

600-yd 
Run- Walk 

Pull-ups 

0.  07 

-0.  07 

0.40 

-0.  33 

Hand  Grip 

-o.  oa 

0.11 

-0. 18 

Shuttle  Run 

-0. 17 

0. 15 

-0.  33 

i _ : _ 

■■■MM 

forced  march.  It  is  not  self-evident  that  these  measure  the  same  factor 
in  fact,  Cureton  apparently  considers  that  the  300-yard  shuttle  run  i 
a  measure  of  explosive  strength  rather  than  of  cardiorespirator 
endurance.  To  clarify  this  point,  61  Camp  Lejeune  Marines  served  a 
subjects.  The  prescribed  gear  was  worn  during  the  3-mile  force 
march.  The  other  two  tests  were  performed  in  gym  suits  and  sneakers 
The  men  ran  each  of  the  three  events  on  a  Latin  square  design  and  th 
times  were  intercor  related.  The  means  and  standard  deviations  ar 
shown  in  Table  10.  Intercorrelations  between  the  three  scores  ar 
displayed  in  Table  11. 


Table  10 


Means  and  Standard  Deviations  of  Running  Events 


ar\ 

mi  Min 

300-yd  Shuttle  Run 

52. 7  sec 

2. 3  sec 

600-yd  Run-Walk 

89. 8  sec 

4. 4  sec 

3-mi  Forced  March 

32.  0  min 

2.  2  min 

Table  11 


Intercor  relation  s  of  Running  Events  (N  =  61) 


Test  Item 

600-yd  Run-Walk 

3-mi  Forced  March 

Shuttle  Run 

0.67 

0.17 

600-yd  Run- Walk 

0.34 
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It  is  clear  that  the  300-yard  shuttle  run  and  the  600-yard 
run-walk  are  substantially  related,  and  that  neither  of  them  are  of  any 
value  in  predicting  performance  in  the  3 -mile  forced  inarch.  It  was 
considered  possible  that  the  load  represented  by  the  gear  required  in 
the  Physical  Readiness  Test  might  have  some  influence  on  the  correla¬ 
tions  between  the  600-yard  run -walk  and  the  3 -mile  forced  march.  To 
test  this  hypothesis,  10  of  the  original  61  subjects  ran  the  same  600-yard 
course  again,  this  time  dressed  as  prescribed  by  the  Physical  Readiness 
Test.  Their  times  were  correlated  with  those  previously  determined. 
The  means  and  standard  deviations  are  displayed  in  Table  12  for  the 
3-mile  forced  march  and  the  correlation  (jr  =  0.  35)  is  shown  in  Table  13. 
Since  the  addition  of  the~lond  made  no  essential  difference  in  the  corre- 
lation,  it  strongly  suggests  that  the  essential  point  involved  is  the 
distance.  It  would  appear  that  both  the  300-yard  shuttle  run  and  the 
600-yard  run-walk  are  largely  measures  of  explosive  strength.  If  the 
3-mile  forced  march  is  a  valid  criterion  of  the  type  of  cardiorespiratory 
endurance  desirable  in  the  Marine  Corps  combat  infantryman,  neither 
the  Initial  Strength  Test  nor  the  Fleishman  Test  battery  includes  an 
item  which  measures  this  parameter. 


Table  12 


Means  and  Standard  Deviations  of  600-yard  Run- Walk  (sec) 


Conditions 

N 

Mean 

SD 

In  gym  costume 

61 

•o.o 

4.4 

In  light  pack 

10 

_ 

111.2 

7.8 

Table  13 

Correlation  of  600-yard  Ran- Walk  in  Light  Combat  Pack 
and  3 -mile  Forced  March 


(N  •  10) 

Svent 

Mean 

£ 

600-yard  Run- Walk 

111.2  sec 

3 -mile  Forced  March 

32.  7  min 

Correlation 

0.35 

Work  la  i ting  up  to  approximately  1  minute  is  said  to  depend 
largely  on  anaerobic  work  capacity  (ability  to  liberate  energy  in  the 
absence  of  oxidation)  while  longer  periods  are  controlled  by  aerobic 
work  capacity.  Perhaps  the  ability  to  perform  work  under  aerobic 
conditions  cannot  be  predicted  by  tests  completed,  or  largely  completed, 
under  anaerobic  conditions.  If  so,  it  may  be  expected  that  a  run  of  at 
least  1/2  mile  will  be  required  to  predict  the  time  of  the  3-mile  forced 
march  to  any  usable  degree. 

On  the  basis  of  Fleishman's  work,  it  would  appear  that  the 
events  comp r i i ingthe Initial  Strength  Test^may  be  classifiedaefollows: 


Pull-ups 
Bend  and  thrusts 
Push-ups 
Sit-ups 

300-yard  shuttle  run 
Side  straddle  hops 


Dynamic  strength  (arms) 
Dynamic  strength  (legs) 
Dynamic  strength  (arms) 

Trunk  strength  (weak  measure) 
Explosive  strength 
? 


The  primary  problem  is  to  determine  ju. .  what  kind  and  how 
much  fitness  a  combat  Marina  needs.  The  writers  have  heard  one 
officer  argue  that  he  actually  needs  very  little,  because  most  of  his 
time  is  spent  crouching  in  a  shell  hole,  from  which  he  emerges  only 
to  run  a  few  yards  to  another  protected  spot.  Unquestionably,  much  oi 
the  fatigue  of  combat  is  psychological,  resulting  from  fear,  hunger, 
shock,  panic,  mental  fatigue,  and  loss  of  sleep.3*  The  extent  to  which 
these  can  be  offset  by  physical  conditioning  is  unknown.  It  is  quite 
possible  that  the  problem  is  primarily  one  of  getting  men  to  a  given 
area  in  condition  to  tight.  The  British  Royal  Marines  use  speed  march- 

mile  in  10  minutes  as  one  of  their  criteria.  53  A  British  Royal  Marine 
captain  now  at  Camp  Lejeunu  has  informed  one  of  the  writers  that  or 
the  basis  of  his  sxpsriencs  in  thres  different  campaigns  he  consider! 
this  quite  a  satisfactoty  measure  of  cardiorespiratory  fitness  for  com¬ 
bat.  In  his  personal  opinion,  the  3-mile  forcsd  march  is  not  satisfactory 
for  this  purpose.  It  will  be  noted  that  the  mean  time  of  our  61  subject* 
in  the  3-mile  forced  march  was  32.0  minutes  (Table  10).  In  the  opinior 
of  the  observers,  their  condition  at  the  end  of  this  run  was  such  as  tc 
render  it  highly  unlikely  that  they  could  have  sustained  this  pace  fo) 
another  6  miles.  By  British  Royal  Marine  standards,  these  men  woulc 
almost  certainly  require  further  conditioning.  The  proper  approach  tt 
this  problem  would  seem  to  be  the  direct  one  —  actually  measure  * 
group  of  men  who  have  demonstrated  their  fitness  by  successful  partici¬ 
pation  in  arduous  combat  patrols  and  similar  maneuvers  in  Viet  Nan 
and  determine  their  performance  capabilities. 
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The  British  standards  are,  of  course,  set  for  their  commando 
oriented  Royal  Marines.  In  the  U.  S.  Marine  Corps,  which  is  much 
larger,  we  have  specialised  troops,  such  as  tankers,  artillerymen,  and 
supply  personnel.  It  is  quite  possible  that  such  specialists  have  no  need 
for  the  same  level  of  physical  fitness  that  is  required  by  the  combat 
infantrymen.  In  such  case  it  would  be  comparatively  simple  to  estab¬ 
lish  different  requirements  for  different  branches  of  the  Corps. 

From  the  standpoint  of  modern  athletic  training  theory,  the 
ingredient  that  seems  to  be  commonly  missing  from  the  Marines'  fit- 
neil  trainiftg^Ls  all-uul  effort. — Th»  troops  C*m p  T  Hn  a  great 

deal  of  double  timing,  but  it  is  extremely  rare  to  see  them  running. 
However,  it  is  precisely  this  level  of  stress  which  is  needed  in  order 
to  achieve  high  levels  of  fitness.  It  is  suggested  that  attention  might 
well  be  given  to  the  introduction  of  interval  training  into  the  condition¬ 
ing  program  of  the  Marine  Corps. 34  one  difficulty  with  this  is  that  the 
"all-out"  effort  of  men  varies  and  the  troops  tend  to  become  so  spread 
out  that  military  control  is  lost.  Some  modification  would  probably  be 
required  m  order  to  keep  the  men  under  the  control  of  their  officers. 


SUMMARY 

1 .  At  the  present  time  there  is  no  general  agreement  as  to 
what  kinds  of  fitness  and  what  levels  of  fitness  are  needed  by  combat 
troops.  Until  a  decision  has  been  reached  on  this  point,  it  will  be 
impossible  to  develop  meaningful  test  batteries. 


2.  The  Physical  Pitas ss  Readiness  Test  makes  certain 
assumptions  which  ars  open  to  question  and  includes  items  which  may 
be  repetitious.  Since  it  is  based  on  minimum  performances  and  per¬ 
mits  differences  in  administration,  it  cannot  be  used  to  compare  groups 
or  to  moasurs  changss  in  condition. 

3.  Ths  Initial  Strangth  Tsst  has  a  dry  weather  and  a  wet 
weather  battery.  The  two  are  not  equivalent.  The  scoring  tablss  are 
to  some  extent  inequitable.  The  battery  is  heavily  weighted  with  dynamic 
strength  teats  and  lacks  static  strength  and  stamina  tests.  Uss  of  a 
different  technique  in  the  pull-up  would  improve  the  distribution  of  the 
scores.  A  different  method  of  scoring  would  be  more  informative  for 
both  testers  and  testees. 

4.  The  Fleishman  Teats  appear  to  have  a  sc<  nd  theoretical 
basis.  Under  certain  conditions,  the  100-yard  V  u.ie  i  -u  is  unsatis- 
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factory  from  an  administrative  viewpoint.  The  6 00 -yard  run -walk  is 
not  a  satisfactory  measure  of  cardiorespiratory  endurance  if  the  3- mile 
forced  march  is  used  as  the  criterion. 
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